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Abstract. Chernoff's bound binds a tail probability (ie. Pr(X > a), where a > EX). Assuming that the 
distribution of X is Q , the logarithm of the bound is known to be equal to the value of relative entropy (or minus 
Kullback-Leibler distance) for I-projection P of Q on a set !H = {P : EpX = a}. Here, Chernoff's bound is related 
to Maximum Likelihood on exponential form and consequently implications for the notion of complementarity 
are discussed. Moreover, a novel form of the bound is proposed, which expresses the value of the Chernoff's 
bound directly in terms of the I-projection (or generalized I-projection). 



INTRODUCTION 

Originally developed as an asymptotic result for 
partial sums of random variables, Chernoff's bound 
[1] was later recognized to be valid 'for any n'. 
It permitted to formulate Chernoff's bound in the 
following form 

Theorem 1. Let X be a random variable such that 
E. e ev(X) < 00, for all 0gR, where v(X) is a concave, 
non- decreasing function of X. Let a > EX, a £ R. 
Then 

logP(X > a) < min logEe 9v(x) - 6v(a), (la) 
9 gr 



or, equivalently 



P(X > a) < min 



£ e 9v(X) 



3eR e l 



Ma) 



(lb) 



Since a proof of the Theorem (see for instance [2]) 
will be used in the sequel, it will be recalled here. 

Proof. Since e ex is a nonnegative valued and mono- 
tone function of X, for 9 > it is increasing in X. By 
assumption v(X) is a non-decreasing function of X. 
Thus, by Markov's inequality 

P(X > a) = P(8X > 9a) = P(8v(X) > 8v(a)) = 

£ e 6v(X) 



= P e 



> e 



Ma) 



< 



3v(a) 



The inequality holds trivially for 9 = 0, thus the 
tightest bound is achieved by minimizing the right- 
hand side expression, over 8 > 0. 



To show that 



Ee ev ( x) 



£ e 9v(X) 



arg mm — Q , , = arg mm — Q , , 
6 e>o e 9v ( a ' 9gR e ev ' a ' 

apply Jensen's inequality both to the exponen- 
tial function and to v(-), then recall that a > EX 
and consequently realize, that point of minimum of 
E. e e(v(x)-v(a)) should occur for non-negative value 
of 9. 
Hence, 

£ e 9v(X) 



P(X > a] < min 



eR e 9v(a) 



Notation: Let us denote 

9 s 4 arg min logEe 9v(x) 

9GR 



■9v(a) 



□ 



(2) 



The entire right-hand side of (la), (lb) will be 
denoted C(a,v(-),9j, C p (a,v(-),9), respectively. 

While it may appear at first glance surprising, 
Chernoff's bound on tail probability for a single ran- 
dom variable can be expressed in terms of quantities 
related to a random sample of asymptotic size. This 
is recalled and summarized in the next two sections. 
The last, relatively self-standing section, introduces 
a novel form/interpretation of Chernoff's bound. 



CHERNOFF'S BOUND AS A MINIMUM 
OF I-DIVERGENCE 

In this and the next section it will be assumed that 
X is either a continuous random variable with pdf 



g(X) defined on a support S; or a discrete random 
variable with an m-element pmf q. 

First, the continuous case. Let "K denote a class of 
pdf's, "K = {f : E f v(X) =v(a)}. Consider the follow- 
ing I-divergence minimization task which consists of 
selecting a pdf t(X) from the class % that is clos- 
est to g(X), where the closeness is measured by I- 
divergence (or I-distance) 



I(f || g) =E f log 



f(X) 
9(X) 



Employing calculus of variations, it is possible to 
show (see for instance [3]) that the unique solution 
(in open form) of the above task is 



9v(x) 



J s g(x)e<M*) 
where is a solution of 



= g(x)e 8v(x) - losE e e 



E f v(X)=v(a) 



(3) 



Consequently, it can be easily seen that the value 
of the I-divergence for the pdf f (x) closest to g(x) at 
the class "K is 



I(f||g|feJC) = §v(a)-logE g e i 



MX) 



Recalling the convex analysis duality theorem (see 
for instance [4]), it can be shown that 8 which solves 
(3) and 9 of (2) are the same. 

Thus, 



C(a,v(-),e)=-I(f||g|fG^) 



(4) 



In words, the logarithm of tail probability of ob- 
taining a value greater than a is bounded by the 
negative of the value of the I-distancc of pdf f(x) 
closest to g(x) in the class IK of all pdf's with value 
of Efv(X) just equal to v(a). 

Equivalent to the I-divcrgcncc minimization task 
is a relative- entropy maximization task (since rela- 
tive entropy H(f || g) = — I(f || g)), thus 



C(a,v(-),&)=H(f||g|fe5C) 



(5) 



The discrete case allows for deeper reading. Let 
now IK denote a class of pmf 's, "K = {p : E p v(X) = 
v(a)}. The relative entropy maximization (REM) 
task 

Pi 



arg max — 5~ Pi log ( — 



1=1 



(6) 



is solved by p\ = q ie ^(xi)-iogE q e v ' x » ; where § 
solves E^v(X) =v(a). Consequently, arguing along 



the same line as in the continuous case leads to the 
conclusion similar to (5), 

C(a,v(-),&)=H(0||q|peW) (7) 
which can now be followed further to get 

C P (a > v(-),^) = fl(|)' 1 (8) 

Recalling the MaxProb justification of REM (see 
[5]) it can be noted that f> is a limit of sequence of 
the most probable occurrence vectors; and this way 
Chernoff's bound becomes related to random sample 
of asymptotic size. 

Example. Let X be defined on support 
[12 3 ... 8] with pmf q= [0.05 0.4 0.2 0.15 0.10 
0.07 0.02 0.01]. Thus EX = 3.19. Setting a = 4 we ask 
for tail probability P(X > 4) which is obviously 0.35. 
The closest in I-divergence to q pmf can be found to 
be p= [0.0236 0.2526 0.1692 0.1699 0.1517 0.1422 
0.0544 0.0364]. Chernoff's bound calculated by 
(8) then gives the value 0.8829. For a = 5 it gives 
0.5675, as compared to true 0.2; for a = 6 it gives 
0.27, (true value is 0.1); and for a — 7 it gives 0.087 
(true value is 0.03). 



CHERNOFF'S BOUND AND MAXIMUM 
LIKELIHOOD 

Let us assume a random sample X = x of size n, 
such that 



1 

-Yv{xi) =v(a) 



where a, v(-) are given. 

Let the supposed population from which the sam- 
ple came be of the following exponential form 



p l (e) = q i e ev ^)- lo s E " eevtX1 
where q is a pmf, thus p is the exponentially tilted 

q 

Maximum likelihood (ML) task lays in searching 
out a value of 6 which is the most likely to generate 
the sample x. The ML estimator 9ml of 9 is known 
to be the solution of the likelihood equation which 
is now just 



v(a) 



E q v(X)e 9v < x > 
E q e 9v ( x > 



Thus, §ml = § (see also [3]). 



It is then interesting to relate Chernoff's bound to 
the above ML task. The log-likelihood 

m / m \ 

l(9)=^TVi logqt + evfxtl-log^qte 9 ^^ 1 



i=1 



i=1 



where nt is occurrence of the i-th element of support 
at the sample. So, 

= >— logqi-C a,v • ,9 , 9a) 

or equivalcntly, with L§ denoting the likelihood at 
maximum, 



c p ( , Jn£i_ql 



(9b) 



-e 



which establish ML-Chernoff's bound links. 

Do they? For instance (9b), combined with (8), 
lead to conclusion 



v=1 



i=1 



which is false, except for the case when p\ = = 
1,2, ...,-m 1 . This case happens to appear just for 
the random sample of asymptotic size. Which solves 
the contradiction: since REM is indeed the method 
which operates with a random sample of infinite size 
(c.f. [5], or [6]). 

ML and REM tasks arc complementary regardless 
of sample size (see [3]). But, as the above 'deduction' 
shows, objective functions of both tasks (maximum 
likelihood, relative entropy respectively) attain a 
compatible relationship only when infinite sample 
size is assumed. And this is indeed the case, because 
REM requires assumption about infiniteness of ran- 
dom sample. 

At the asymptotic, thanks to a conditional weak 
law of large numbers (see [6]), Chernoff's bound is 
linked to the exponential form Maximum Likelihood 

by 

K!MLU£ fMogqt-C(.) 

which leads further to the conclusion (similar in 
spirit to the Asymptotic Equipartition Property) 



U0/vil) p, 



n 



-H(p| P e3{) 



where H(p) = — ^Pilogpt is Shannon's entropy. 



1 And except for the trivial case qi/p\ = 1 /m, for all i 



NEW FORM OF CHERNOFF'S BOUND 

The logarithm of the tail probability logPr(X > a) 
cannot exceed the convex conjugate of the cumulant 
generating function, of the random variable v(X) - 
this is a statement of the 'log-Chernoff bound' (recall 
(la)), for the log-tail-probability. Assuming that the 
distribution of X is Q, the value of the log-Chernoff 's 
bound becomes equal to negative of the value of 
the Kullback-Leibler distance (I-divergence) for I- 
projection P of Q on a set "K = {P : E P v(X) =v(a)}, 
recall (4). Under the assumption, the Chernoff's 
bound value can also be expressed directly in terms 
of I-projection - as will be shown here. 

In order to make it relatively self-standing and 
precise, it will be given in terms of measure theory 
and I-projection (see [7]). Though the presented 
variant of Chernoff's bound is the same in the case 
of a discrete random variable as well as in the case 
of a continuous one, each case will be discussed 
under different existence considerations, hence its 
formulation is separated into separate theorems. 



Discrete measure 

Theorem 2. Lef(n,!F, Q) be a countable probability 
space and let X : D. — > R be a random variable taking 
values {xi ,X2, . . . }. Let a G R such that a > EqX. 
Assume that Eq6 9X < oo for all e R. Let 3> denote 
the class of all probability measures on (0,3") and 
3i — {P € 7 : EpX = a}. If a is in the convex hull of 
{*i > x 2, • • •} Assume this to be the case. Let 

P be the l-projection of Q on CK, that is I(P||Q) = 
inf PeK I(P||Q). J/I(P||Q) is finite, then 



Q(cu :X(a>) > a) < 



Q(a) 



Proof. To save space, let p^ = P{x|}, qi = Q{x|}, 

Pi = P{Xi}- 

By ([4], Thm II.5.2, Thm VIII.3.1), under the 
assumptions, the I-projection of Q on "K exists, it 
is unique, and has the following form 



where 



) i = q ie ^- lo s E Q eA 



A = argminE Q e A(x ~ a) 

A 



(10) 

(ii) 



exists and it is unique. 

Since a > EX, Eqe ex < oo for all 9 G R, the stan- 
dard proof of Chernoff's bound (see the Introduc- 
tion) guarantees that 

minE Q e e(x - a) > Q(X> a) 

9 <E R 



or, with use of (11) 



E Q e A(x - Q) >Q(X>a) 



(12) 



Noting that P(a) = Q(a)e Xa - lo s E Q fiAX then shows 
that the LHS of (12) is just , which completes 
the proof. □ 

Note 1. The claim of Theorem 2 could be di- 
rectly extended by replacing X by any concave, non- 
decreasing and bounded function v(X) . 



Absolutely continuous measure 

Let now a measurable function X : Cl — > R, defined 
on a probability space (Cl,^, \x] induces on R a law 
Q dominated by Lebesgue measure A, so that its den- 
sity q(X) with respect to A exists. Let "K be a convex 
set of laws P on R whose densities p(X) with respect 
to Lebesgue measure exist. I-projection P of Q on 
■K is then such P 6 IK that I(P||Q) = inf PeM I(P||Q), 
where I (P || Q ) = J" p (x) log Eg A ( dx) . There, log = 

0, log-jj = +oo conventions are assumed 2 . 

Assuming existence of I-projection, the new form 
of Chernoff's bound can be stated as follows: 

Theorem 3. Let v(X) be a concave and non- 
decreasing function of X. Let a > EqX, a £ R. Let 
J{ = {p : Epv(X) =v(a)}. Let$(x) - the density corre- 
sponding to the I-projection of Q on IK - exist. Let 
Eqe ev(X) < ^ E Q v(X)e ev(x » < oo, for all 6 e R. 
Then 

^cu:X(u>)>a)<^j 

provided that q(a) ^ 0, fi(a) ^ and that a is the 
point where both fi(X) and q(X) are unique. 

Proof. By Theorem 3.1 and Corollary 3.1 of [7] I- 
projection of Q on Jf has a density with respect 
to Lebesgue measure of the following open form 

p( X ,Tl) = e tlv(x)-logE Q e^' x ' ; which ig clogcd by 

such that Epv(X) = v(a). The density is unique, up 
to a set K of measure zero. 

By assumptions E Q e 9v(x) < oo,V9 e 
R, E Q v(X)e 9v(x) < oo,V9 e R so § = 
argmine e REQe e(v,x ' _v,a " exists and it is unique. 
The assumptions also guarantee that differentia- 
tion of E Q e e(v(x) - v(Q)) with respect to 9 can be 



2 The definition of I-projection was adapted from [7]. 
Throughout the paper log denotes the natural logarithm 
(though it is in fact immaterial for the claims which are 
made) . 



performed under integral (cf. [8], Theorem A(9.1)). 
Consequently, it can be directly seen that § solves 
E^v(X) =v(a) and is identical with fj. 

(The above argument could be also made by in- 
voking ([4], Thm VIII.3.1).) 

It is assumed that £ ^ X, and different than zero 
as is also assumed q (a) , thus 



y: Qe §(v(X)-v(a)) 



q[aj 
P(a) 



(13) 



The assumption E Q e ev,,c) < oo, V9 e R together 
with assumed properties of v(-) guarantee validity of 
Chernoff's bound claim: 

E Q e 8(v(x) - v(Q)) > n(X>a) (14) 

Comparing (13) and (14) completes the proof. 

□ 

As far as the existence of I-projection is concerned, 
Csiszar's work (see [7], discussion on pp. 151, 154 
and Theorems 2.1, 3.2) implies that for the case 
considered above, if I(P|| Q) < oo for some P G JC and 
if 9£ 7^ and if v(X) is bounded then the I-projection 
P of Q on "K exists, it is unique, and has the form 

p(x) = q(x)e 8v ( x >- lo s E Q e8v(X) . 

Though the I-projection may not exist in the 
case of unbounded v(X), nevertheless generalized I- 
projection introduced by Tops0e (see [9]) and studied 
further by Csiszar (see [10]) exists and take up 
the exponential form, which - even in this case - 
permits to formulate Chernoff's bound in terms of 
generalized I-projection. This will be done after a 
brief reminder of generalized I-projection, which is 
adapted from [10]. 

Let (S, 55) be a measurable space, X - random vari- 
able, and P,Q be two probability measures defined 
on the measurable space. I-divergence I(P||Q) be- 
tween them is 



I(P||Q) = 



Jlog(dP/dQ)dP 
+oo 



if P«Q 
otherwise 



and let IK be a set of probability measures on (S,"B). 
Let 



1WWQ) 



inf I(P||Q) 
Pew 



Generalized I-projection P of Q on "K is such 
a probability measure not necessarily in "K that 
every sequence of probability measures P n G 'K with 
I(PnllQ) I(?C||Q) converges to P in variation. 

Making use of Csiszar's results, the generalized I- 
projection form of Chernoff's bound can be stated 
as follows: 

Theorem 4. Letv(X) be a concave, non- decreasing, 
not necessarily bounded function o/X. Let a > EqX, 



a G R. Let EQe e(v(x) v(a " attain its minimum at 
0. Let % = {? : Epv(X) = v(a)}. Let ^{x) be the 
generalized l-projection of Q on !K. 
Then 

Pr(X > a) < 



dQ 1 



provided that unique -3tt(ci) y^O. 



dQ 1 



Proof. Since § exists (by assumption), by (cf. [10], 
p. 778) the generalized I-projection ol Q on IK is 



dP 
dQ 



M =e 



§v(X)-logE Q e 8v(x) 



(15) 



Thus, 1/#J(a) is just E Q e § ( v ( x »- v ( Q »», ie. the Cher- 



noff's bound value, which binds Pr(X > a). □ 
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